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Abstract 

Sequences of events describing the behavior and actions of users or systems 
can be collected in several domains. We consider the problem of discovering 
frequently occurring episodes in such sequences. An episode is defined to 
be a collection of events that occur relatively close to each other in a given 
partial order. Once such episodes are known, one can produce rules for 
describing or predicting the behavior of the sequence. We give efficient 
algorithms for the discovery of all frequent episodes from a given class of 
episodes, and present extensive experimental results. The methods axe in 
use in telecommunication alarm management. 
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1 Introduction 

Most data mining and machine learning techniques are adapted towards the 
analysis of unordered collections of data. However, there are important ap- 
plication areas where the data to be analyzed consists of a sequence of events. 
Examples of such data are alarms in a telecommunication network, user in- 
terface actions, crimes committed by a person, occurrences of recurrent ill- 
nesses, etc. Recently, interest in knowledge discovery from sequential data 
has increased: see, e.g., [5, 8, 17, 19, 24]. 

Abstractly, such data can be viewed as a sequence of events, where each 
event has an associated time of occurrence. An example of an event sequence 
is represented in Figure 1. Here A, 5,C, E, and F are event types, e.g., 
different types of alarms from a telecommunication network, or different types 
of user actions, and they have been marked on a time line. 

One basic problem in analyzing such a sequence is to find frequent epis- 
odes^ i.e., collections of events occurring frequently together. For example, 
in the sequence of Figure 1, the episode "E is followed by F n occurs several 
times, even when the sequence is viewed through a narrow window. Epis- 
odes, in general, are partially ordered sets of events. From the sequence in 
the figure one can make, for instance, the observation that whenever A and 
B occur (in either order), C occurs soon. 

When discovering episodes in a telecommunication network alarm log, 
the goal is to find relationships between alarms. Such relationships can then 
be used in an on-line analysis of the incoming alarm stream, e.g, to better 
explain the problems that cause alarms, to suppress redundant alarms, and 
to predict severe faults. 

In this paper we consider the following problem. Given a class of epis- 
odes and an input sequence of events, find all episodes that occur frequently 
in the event sequence. We describe the framework and formalize the know- 
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Figure 1: A sequence of events. 

ledge discovery task in Section 2. Algorithms for discovering all frequent 
episodes axe given in Section 3. Thery are based on the idea of first finding 
small frequent episodes, and then progressively looking for larger frequent 
episodes. Additionally, the algorithms use some simple pattern matching 
ideas to speed up the recognition of occurrences of single episodes. Section 4 
outlines an alternative way of approaching the problem, based on locating 
minimal occurrences of episodes. Experimental results using both approaches 
and with various data sets axe presented in Section 5. We discuss extensions 
and review related work in Section 6. Section 7 is a short conclusion. 

2 Event sequences and episodes 

Our overall goal is to analyze sequences of events, and to discover recurrent 
combinations of events, which we call frequent episodes. We first formulate 
the concept of event sequence, and then look at episodes in more detail. 

2.1 Event sequences 

We consider the input as a sequence of events, where each event has an 
associated time of occurrence. Given a set E of event types, an event is a 
pair (A, tf), where A G E is an event type and t is an integer, the (occurrence) 
time of the event. The event type can actually contain several attributes; for 
simplicity we consider here the event type as a single value. 
An event sequence s on E is a triple (s, T*,T e ), where 

5 = ((A u t x ),(A 2} t 2 ), . , . ,(A n: t n )) 
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Figure 2: The example event sequence and two windows of width 5. 

is an ordered sequence of events such that A{ £ E for all i = 1, . . . , n, and 
U < for all i = l,...,n - 1. Further on, T s < T e are integers, T s is 
called the starting time and T e the ending time, and T s < U < T e for all 
% — 1 , * * . , ti . 

Example 1 Figure 2 presents graphically the event sequence s = (s, 29, 68), 
where 

s = <(£, 31), (A 32), (jP, 33), (A, 35), (5, 37), (C, 38), ...,(£, 67)) . 

Observations of the event sequence have been made from time 29 to just 
before time 68. For each event that occurred in the time interval [29, 68), the 
event type and the time of occurrence have been recorded. □ 

In the analysis of sequences we are interested in finding all frequent epis- 
odes from a class of episodes. To be considered interesting, the events of 
an episode must occur close enough in time. The user defines how close is 
close enough by giving the width of the time window within which the epis- 
ode must occur. We define a window as a slice of an event sequence, and 
we then consider an event sequence as a sequence of partially overlapping 
windows. In addition to the width of the window, the user specifies in how 
many windows an episode has to occur to be considered frequent. 

Formally, a window on event sequence s = (5, T a , T e ) is an event sequence 
w = (w,t 5 ,* e ), where t s < T e ,t e > T s , and w consists of those pairs {A,t) 
from s where t a < t < t e . The time span t e — t s is called the width of the 
window w, and it is denoted width(w). Given an event sequence s and an 
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integer turn, we denote by W(s. win) the set of all windows w on s such that 
width(vt) — win. 

By the definition the first and last windows on a sequence extend outside 
the sequence, so that the first window contains only the first time point of 
the sequence, and the last window contains only the last time point. With 
this definition an event close to either end of a sequence is observed in equally 
many windows to an event in the middle of the sequence. Given an event 
sequence s = (s,T Si T e ) and a window width win, the number of windows in 
W(s, win) is T e — T s + win — 1. 

Example 2 Figure 2 shows two windows of width 5 on the sequence s of 
the previous example. A window starting at time 35 is shown in solid line, 
and the immediately following window, starting at time 36, is depicted with 
a dashed line. The window starting at time 35 is 

({(A, 35), (B, 37), (C, 38), (E, 39)) , 35, 40). 

Note that the event (F, 40) that occurred at the ending time is not in the 
window. The window starting at 36 is similar to this one; the difference is 
that the first event (^4, 35) is missing and there is a new event (F, 40) at the 
end. 

The set of the 43 partially overlapping windows of width 5 constitutes 
W(s,5); the first window is (0,25,30), and the last is (((£>, 67)) ,67,72). 
Event 67) occurs in 5 windows of width 5, as does, e.g., event (C, 50). □ 

2.2 Episodes 

Informally, an episode is a partially ordered collection of events occurring 
together. Episodes can be described as directed acyclic graphs. Consider, 
for instance, episodes a, /?, and 7 in Figure 3. Episode a is a serial episode: 
it occurs in a sequence only if there are events of types E and F that occur 
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Figure 3: Episodes a,/?, and 7. 



in this order in the sequence. In the sequence there can be other events 
occurring between these two. The alarm sequence, for instance, is merged 
from several sources, and therefore it is useful that episodes are insensitive 
to intervening events. Episode (3 is a parallel episode: no constraints on the 
relative order of A and B are given. Episode 7 is an example of non-serial 
and non-parallel episode: it occurs in a sequence if there are occurrences of 
A and B and these precede an occurrence of C; no constraints on the relative 
order of A and B are given. We mostly consider the discovery of serial and 
parallel episodes. 

We now define episodes formally. An episode a is a triple (V, <>g) where 
V is a set of nodes, < is a partial order on V, and g : V — ¥ E is a mapping 
associating each node with an event type. The interpretation of an episode 
is that the events in g(V) have to occur in the order described by <. The 
size of a, denoted |a|, is |V|. Episode a is parallel if the partial order < is 
trivial (i.e., x % y for all x, y 6 V such that x ^ y). Episode a is serial if the 
relation < is a total order (i.e., x < y or y < x for all x, y £ V). Episode a 
is injective if the mapping g is an injection, i.e., no event type occurs twice 
in the episode. 

Example 3 Consider episode a = (V, <,flf) in Figure 3. The set V contains 
two nodes; say x and y. The mapping g labels these nodes with the event 
types that are seen in the figure: g(x) — E and g(y) = F. An event of type 
E is supposed to occur before an event of type F, i.e., x precedes y, and we 
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have x < y. Episode a is injective, since it does not contain duplicate event 
types; in a window where a occurs there may, however, be multiple events 
of types E and F. □ 

We next define when an episode is a subepisode of another; this relation 
is used extensively in the algorithms for discovering all frequent episodes. An 
episode (3 = (V 7 , <',5') is a subepisode of a = (V, denoted (3 -< a, if 

there exists an injective mapping / : V — ► V such that g*(v) = g(f(v)) for 
all v G V, and for all G V" with v <' w also /(v) < /(u>). An episode 
a is a superepisode of /? if and only if /3 ^ a. We write (3 •< a \i (3 < a and 

Example 4 From Figure 3 we see that f3 -< 7 since /? is a subgraph of 7. 
In terms of the definition, there is a mapping / that connects the nodes 
labeled A with each other and the nodes labeled B with each other, i.e., 
both nodes of /? have (disjoint) corresponding nodes in 7. Since the nodes in 
episode /? are not ordered, the corresponding nodes in 7 do not need to be 
ordered, either. □ 

Consider now what it means that an episode occurs in a sequence. The 
nodes of the episode need to have corresponding events in the sequence such 
that the event types are the same and the partial order of the episode is 
respected. An episode a — (V, <-g) occurs in an event sequence 

s = (((A u h), (A 2 ,i 2 ),..., {An, <„)) , T„, T e ), 

if there exists an injective mapping h :V — ► {1, . . . , n} from nodes to events, 
such that g(x) = A^ x ) for all x £ V. and for all x,y G V with x ^ y and 
x < y we have t h ( x ) < ^( y ). 

Example 5 The window (iu,35,40) of Figure 2 contains events A, B, C, 
and E. Episodes (3 and 7 of Figure 3 occur in the window, but a does not. 

□ 
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Algorithm 1 

Input: A set E of event types, an event sequence s over E, a set £ of epis- 
odes, a window width win, a frequency threshold min_fr, and a confidence 
threshold min__conf. 

Output: The episode rules that hold in s with respect to win, min_fr, and 

mm_ conf. 

Method: 

1. /* Find frequent episodes (Algorithm 2): */ 

2. compute T\s, win, min_fr)\ 

3. /* Generate rules: */ 

4. for all a G .F(s, win, min_fr) do 

5. for all (3 -< a do 

6. if fr(a)/fr(l3) > min_ conf then 

7. output the rule j8 — > a and the confidence fr(a)/fr(P)', 



We define the frequency of an episode as the fraction of windows in 
which the episode occurs. That is, given an event sequence s and a win- 
dow width win, the frequency of an episode a in s is 

|{w £ W(s, win) | a occurs in w}| 
fr[a, s, win) = — 

Given a frequency threshold min_fr, a is frequent if fr[a,s,win) > 
min_fr. The task we are interested in is to discover all frequent episodes 
from a given class £ of episodes. The class could be, e.g., all parallel epis- 
odes or all serial episodes. We denote the collection of frequent episodes with 
respect to s, win and min_fr by ^(s, win, min_fr). 

Once the frequent episodes are known, they can be used to obtain rules 
that describe connections between events in the given event sequence. For 
example, if we know that the episode /? of Figure 3 occurs in 4.2 % of the 
windows and that the superepisode 7 occurs in 4.0 % of the windows, we can 
estimate that after seeing a window with A and B, there is a chance of about 
0.95 that C follows in the same window. Such rules show the connections 
between events more clearly than frequent episodes alone. Algorithm 1 shows 
how rules and their confidences can be computed from the frequencies of 
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Algorithm 2 

Input: A set E of event types, an event sequence s over 25, a set £ of 
episodes, a window width twin, and a frequency threshold mm_/r. 
Output: The collection F(s, win, min_fr) of frequent episodes. 
Method: 

1. compute Ci := {a e £ \ \&\ = 1}; 

2. / := 1; 

3^ while do 

4. /* Database pass (Algorithms 4 and 5): */ 

5. compute T\ := {a £ Ci \ fr(a } s, win) > min_fr}; 

6. /:=Z + 1; 

7. /* Candidate generation ("Algorithm 3): */ 

8. compute Ci := {a € £ | |o| = / and for all ft G. £ such that /3 < a and 

9. |/?| < I we have /3 € T\p\}\ 
10. for all / do output T\\ 



episodes. Note that indentation is used in the algorithms to specify the 
extent of loops and conditional statements. 



3 Algorithms 

Given all frequent episodes, the rule generation is straightforward. We now 
concentrate on the following discovery task: given an event sequence s, a 
set £ of episodes, a window width win, and a frequency threshold min_fr. 
find ^(s, win, min_fr). We give first a specification of the algorithm and 
then exact methods for its subtasks. We call these methods collectively the 
WlNEPi algorithm. 

3.1 Main algorithm 

Algorithm 2 computes the collection ^(s, win, min_fr) of frequent episodes 
from a class £ of episodes. The algorithm performs a levelwise (breadth-first) 
search in the episode lattice spanned by the subepisode relation. The search 
starts from the most general episodes, i.e., episodes with only one event. On 
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each level the algorithm first computes a collection of candidate episodes, 
and then checks their frequencies from the event sequence database. The 
crucial point in the candidate generation is given by the following immediate 
lemma. 

Lemma 6 If an episode a is frequent in an event sequence s, then all subepis- 
odes (3 ■< a are frequent. 

The collection of candidates is specified to consist of episodes such that 
all smaller subepisodes are frequent. This criterion safely prunes from con- 
sideration episodes that can not be frequent. More detailed methods for the 
candidate generation and database pass phases are given in the following 
subsections. 

3.2 Generation of candidate episodes 

We present now a candidate generation method in detail. The method can 
be easily adapted to deal with the classes of parallel episodes, serial episodes, 
and injective parallel and serial episodes. 

Algorithm 3 computes candidates for parallel episodes. In the algorithm, 
an episode a = (V, <,g) is represented as a lexicographically sorted array of 
event types. The array is denoted by the name of the episode and the items 
in the array are referred to with the square bracket notation. For example, 
a parallel episode a with events of types A, C, C, and F is represented as an 
array a with ct[l] = A : a [2] = C, a[3] = C, and a[4] = F. Collections of 
episodes are also represented as lexicographically sorted arrays, i.e., the iih 
episode of a collection T is denoted by !F[i\. 

Since the episodes and episode collections are sorted, all episodes that 
share the same first event types are consecutive in the episode collection. In 
particular, if episodes and of size / share the first / — 1 events, 

then for all k with i < k < j we have that Ti[k] shares also the same 
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Algorithm 3 

Input: A sorted array T\ of frequent parallel episodes of size /. 
Output: A sorted array of candidate parallel episodes of size I + 1. 
Method: 



1- C,+i := 0; 

2 k i—O 1 

3. if I = 1 then for h := 1 to do block _start[h] := 1; 

4. for i := 1 to do 

5. current _ biock_ start := + 1; 

6. for (j := i; Fi. block start\j] = .F/.6focfc_starf[i]; j := j + 1) do 

7. /* and J 7 /]}"] have / - 1 first event types in common, 

8. build a potential candidate a as their combination: */ 

9. for x := 1 to / do a[x] := .F/[i][x]; 

10. a[/+l]:=7}[;][/]; 

11. /* Build and test subepisodes /? that do not contain a[y]: */ 

12. for y := 1 to / - 1 do 

13. for x := 1 to y - 1 do /?[x] := a[xh 

14. for x := y to / do /?[x] := a[x + 1]; 

15. if /? is not in 7} then continue with the next j at line 6; 

16. /* All subepisodes are in J/, store a as candidate: */ 

17. fc:=fc + l; 

18. Ct+i[k]:=a; 

19. C/+i.6/ocA:_5iarf[A;] := current _block_ start; 

20. output C/+i; 



events. A maximal sequence of consecutive episodes of size I that share the 
first / — 1 events is called a block Potential candidates can be identified by 



creating all combinations of two episodes in the same block. For the efficient 
identification of blocks, we store in T\. block _start\j] for each episode Ti[j\ 
the i such that is the first episode in the block. 

Algorithm 3 can be easily modified to generate candidate serial episodes. 
Now the events in the array representing an episode are in the order imposed 
by a total order <. For instance, a serial episode (3 with events of types 
C, A, F, and C, in that order, is represented as an array /? with j3[l] = C, 
/?[2] = A, 0[3] = F, and p[4] = C. By replacing line 6 by 

6. for (j := Ti.block_start[i}\ Ti.block_start\j] — Tu block _ start[i]\ j := j -f- 1) do 

Algorithm 3 generates candidates for serial episodes. 
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There are further options with the algorithm. If the desired episode class 
consists of parallel or serial injective episodes, i.e., no episode should contain 
any event type more than once, simply insert line 

66. if j = i then continue with the next j at line 6; 

after line 6. 

The time complexity of Algorithm 3 is polynomial in the size of the col- 
lection of frequent episodes and it is independent of the length of the event 
sequence. 

Theorem 1 Algorithm 3 (with any of the above variations) has time com- 
plexityO(/ 2 |^| 2 log|^|)- 

Proof The initialization (line 3) takes time C?(|^-}|). The outer loop (line 4) 
is iterated CJfl^zl) times and the inner loop (line 6) 0(1^/1) times. Within the 
loops, a potential candidate (lines 9 and 10) and Z — 1 subcandidates (lines 12 
to 14) are built in time 0(1 + 1 + (I - 1)1) = 0(l 2 ). More importantly, the 
/ — 1 subsets need to be searched for in the collection T\ (line 15). Since 
T\ is sorted, each subcandidate can be located with binary search in time 
0(l\og \Tt\). The total time complexity is thus 0(\Ti\ + [l 2 + {I - 

l)/log|^|)) = C?(Z 2 |^| 2 log|^|). □ 
In practical situations the time complexity is likely to be close to 

0(l 2 \Ti\ log since the blocks are typically small. 

3.3 Recognizing episodes in sequences 

Let us now consider the implementation of the database pass. We give al- 
gorithms which recognize episodes in sequences in an incremental fashion. 
For two windows w = (tu, t s + win) and w 7 = (u/, t s + 1, t s + win +1), the 
sequences w and w f of events are similar to each other. We take advantage of 
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this similarity: after recognizing episodes in w, we make incremental updates 
in our data structures to achieve the shift of the window to obtain w'. 

The algorithms start by considering the empty window just before the 
input sequence, and they end after considering the empty window just after 
the sequence. This way the incremental methods need no other special ac- 
tions at the beginning or end. When computing the frequency of episodes, 
only the windows correctly on the input sequence axe, of course, considered. 

3.3.1 Parallel episodes 

Algorithm 4 recognizes candidate parallel episodes in an event sequence. The 
main ideas of the algorithm are the following. For each candidate parallel 
episode a we maintain a counter a. event _ count that indicates how many 
events of a are present in the window. When a. event _ count becomes equal 
to indicating that a is entirely included in the window, we save the 
starting time of the window in a.inwindow. When a. event _ count decreases 
again, indicating that a is no longer entirely in the window, we increase the 
field a.freq_ count by the number of windows where a remained entirely in 
the window. At the end, a.freq_ count contains the total number of windows 
where a occurs. 

To access candidates efficiently, they are indexed by the number of events 
of each type that they contain: all episodes that contain exactly a events of 
type A are in the list contains ( A ,a). When the window is shifted and the 
contents of the window change, the episodes that are affected are updated. 
If, for instance, there is one event of type A in the window and a second 
one comes in, all episodes in the list contains(A } 2) are updated with the 
information that both events of type A they are expecting are now present. 



12 



i 



Algorithm 4 

Input: A collection C of parallel episodes, an event sequence s = (s, T s ,r e ), 

a window width ium 5 and a frequency threshold min_fr. 

Output: The episodes of C that are frequent in s with respect to win 

and min_fr. 

Method: 



1. /* Initialization: */ 

2. for each a in C do 

3. for each A in a do 

4. A.count := 0; 

5. for i := 1 to |a| do contains{A^ i) := 0; 

6. for each a in C do 

7. for each A in a do 

8. a := number of events of type A in a; 

9. contains(A, a) := contains(A, a) U {a}; 

10. a. event _ count :— 0; 

11. a.freq_count := 0; 

12. /* Recognition: */ 

13. ior start := T s - win+ 1 to T e do 

14. /* Bring in new events to the window: */ 

15. for all events (.4, t) in s such that t = start -f- win - 1 do 

16. AcounJ := A.count + 1; 

17. for each a £ contains (A Acountf) do 

18. a. event _ count := a.event_ count -f- Acoun£; 

19. if a.ei/eni_ coura* = a\ then a.inwindow := start; 

20. /* Drop out old events from the window: */ 

21. for all events (A, i) in 5 such that t = star* - 1 do 

22. for each a G contams(A A.count) do 

23. if a,ei/enJ_ count = |a| then 

24. a.freq_ count := a.freq_ count — a.inwindow + start; 

25. a;. event_ count := a. event_ count — A.count ; 

26. A.count := A.count - 1; 

27. /* Output: */ 

28. for all episodes a in C do 

29. if a.freq_ count/ (T e -T s + win - 1) > min_fr then output a; 



3.3.2 Serial episodes 

Serial candidate episodes are recognized in an event sequence by using state 
automata that accept the candidate episodes and ignore all other input. The 
idea is that there is an automaton for each serial episode a, and that there 
can be several instances of each automaton at the same time, so that the 
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active states reflect the (disjoint) prefixes of a occurring in the window. 

4 

Algorithm 5 implements this idea. 

We initialize a new instance of the automaton for a serial episode a every 
time the first event of a comes into the window; the automaton is removed 
when the same event leaves the window. When an automaton for a reaches 
its accepting state, indicating that a is entirely included in the window, and 
if there axe no other automata for a in the accepting state already, we save 
the starting time of the window in a.inwindow. When an automaton in the 
accepting state is removed, and if there are no other automata for a in the 
accepting state, we increase the field a.freq_ count by the number of windows 
where a remained entirely in the window. 

It is useless to have multiple automata in the same state, as they would 
only make the same transitions and produce the same information. It suffices 
to maintain the one that reached the common state last since it will be 
also removed last. There axe thus at most \a\ automata for an episode a. 
For each automaton we need to know when it should be removed. We can 
thus represent all the automata for a with one array of size |a|: the value 
of a.initialized[i] is the latest initialization time of an automaton that has 
reached its ith state. Recall that a itself is represented by an array containing 
its events; this array can be used to label the state transitions. 

To access and traverse the automata efficiently they are organized in the 
following way. For each event type A £ E, the automata that accept A 
are linked together to a list waits (A). The list contains entries of the form 
(a, x) meaning that episode a is waiting for its xth event. When an event 
(A^t) enters the window during a shift, the list waits(A) is traversed. If an 
automaton reaches a common state i with another automaton, the earlier 
entry a.initialized[i] is simply overwritten. 

The transitions made during one shift of the window are stored in a list 
transitions. They are represented in the form (a,x,J) meaning that episode 
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Algorithm 5 

Input: A collection C of serial episodes, an event sequence s = (s, T^, !T e ) 5 a 
window width wm, and a frequency threshold min_fr. 

Output: The episodes of C that are frequent in s with respect to win 

and min_fr. 

Method: 



1. /* Initialization: */ 

2. for each a in C do 

3. for i := 1 to |a| do 

4. a.initialized[i] := 0; 

5. waits(a[i]) := 0; 

6. for each a EC do 

7. waits(a[l]) := wazte(a[l]) U {(a, 1)}; 

8. a.freq__count :— 0; 

9. for t := T 5 - win to T 5 - 1 do beginsat(t) := 0; 

10. /* Recognition: */ 

11. for start := T s - wm + 1 to T e do 

12. /* Bring in new events to the window: */ 

13. beginsat (start + win — 1) := 0; 

14. transitions := 0; 

15. for all events (A,£) in s such that f = start + tuui — 1 do 

16. for all (a, j) 6 wa2fc(A) do 

17. if j = |a| and a.initialized[j] = 0 then a.inwindow := start; 

18. ifj = lthen 

19. transitions := transitions U {(a, 1, star* + win - 1)}; 

20. else 

21. transitions := transitions U {(a, j, oi.initialized\j - 1])}; 

22. beginsat(o>.initialized\j — 1]) := 

23. beginsat (a. initialized^ - 1]) \ {(a, j - 1)}; 

24. a .initialized]^ - 1] := 0; 

25. watts (A) := watte (A) \ {(a, j)}; 

26. for all £) G transitions do 

27. a.initialized[j] := £; 

28. beginsat(t) := beginsat(t) U {(a, 

29. if j < j or j then u>aite(a[j + lj) := waits(a[j + 1]) U {(a, j + 1)}; 

30. /* Drop out old events from the window: */ 

31. for all (aj) € begins at (start - 1) do 

32. if / = |a| then a.freq_ count := a.freq count - a.inwindow + start] 

33. else wazte (<*[/ + 1]) := waits(a[l + 1]) \ {(a, I + 1)}; 

34. a.imft'a/ued[7] := 0; 

35. /* Output: */ 

36. for all episodes a in C do 

37. if a.freq_ count/ (T e -T s + win - 1) > min_fr then output a; 
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a got its xth event, and the latest initialization time of the prefix of length x 
is t. Updates regarding the old states of the automata are done immediately, 
but updates for the new states are done only after all transitions have been 
identified, in order to not overwrite any useful information. For easy removal 
of automata when they go out of the window, the automata initialized at 
time t are stored in a list beginsat(t). 

3.3.3 Analysis of time complexity 

For simplicity, suppose that the class of event types E is fixed, and assume 
that exactly one event takes place every time unit. Assume candidate epis- 
odes are all of size /, and let n be the length of the sequence. 

Theorem 2 The time complexity of Algorithm 4 is 0((n + I 2 ) \C\). 

Proof Initialization takes time 0(\C\ I 2 ). Consider now the number of the 
operations in the innermost loops, i.e., accesses to a.event_count on lines 18 
and 25. In the recognition phase there are 0(n) shifts of the window. In 
each shift, one new event comes into the window, and one old event leaves 
the window. Thus, for any episode a, a.event_count is accessed at most 
twice during one shift. The cost of the recognition phase is thus 0(n \C\). □ 
In practice the size I of episodes is very small with respect to the size 
n of the sequence, and the time required for the initialization can be safely 
neglected. For injective episodes we have the following tighter result. 

Theorem 3 The time complexity of recognizing injective parallel episodes 
in Algorithm 4 (excluding initialization) is 0(-^ \C\l + n). 

Proof Consider win successive shifts of one time unit. During such sequence 
of shifts, each of the \C\ candidate episodes a can undergo at most 21 changes: 
any event type A can have A. count increased to 1 and decreased to 0 at most 
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once. This is due to the fact that after ari event of type A has come into the 
window, A.count > 1 for the next win time units. Reading the input takes 
time n. □ 
This time bound should be contrasted with the time usage of a trivial 
non-incremental method where the sequence is pre-processed into windows, 
and then frequent sets are searched for. The time requirement for recognizing 
\C\ candidate sets in n windows, plus the time required to read in n windows 
of size win, is Q(n \C\ I + n • win), i.e., larger by a factor of win. 

Theorem 4 The time complexity of Algorithm 5 is 0(n \C\l). 

Proof The initialization takes time 0(\C\ l + win). In the recognition phase, 
again, there are 0(n) shifts, and in each shift one event comes into the 
window and one event leaves the window. In one shift, the effort per an 
episode a depends on the number of automata accessed; there are a maximum 
of / automata for each episode. The worst-case time complexity is thus 
0(\C\ I + win + n \C\ I) = 0{n \C\ I) (note that win is 0(n)). □ 

■ 

In the worst case the input sequence consists of events of only one event 
type, and the candidate serial episodes consist only of events of that par- 
ticular type. Every shift of the window results now in an update in every 
automaton. This worst-case complexity is close to the complexity of the 
trivial non-incremental method 0(n \C\ I + n • win). In practical situations, 
however, the time requirement is considerably smaller, and we approach the 
savings obtained in the case of injective parallel episodes. 

Theorem 5 The time complexity of recognizing injective serial episodes in 
Algorithm 5 (excluding initialization) is 0(n \C\). 

Proof Each of the 0(n) shifts can now affect at most two automata for 
each episode: when an event comes into the window there can be a state 
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7 $' 5" 

Figure 4: Recursive composition of a complex episode. 



transition in at most one automaton, and at most one automaton can be 
removed because the initializing event goes out of the window. □ 

3.4 General partial orders 

So far we have only discussed serial and parallel episodes. We next discuss 
briefly the use of other partial orders in episodes. The recognition of an 
arbitrary episode can be reduced to the recognition of a hierarchical com- 
bination of serial and parallel episodes. For example, episode 7 in Figure 4 
is a serial combination of two episodes: a parallel episode 8* consisting of 
A and J5, and an episode 8" consisting of C alone. The occurrence of an 
episode in a window can be tested using such hierarchical structure: to see 
whether episode 7 occurs in a window one checks (using a method for serial 
episodes) whether the subepisodes 8' and 8" occur in this order; to check the 
occurrence of 8' one uses a method for parallel episodes to verify whether A 
and B occur. 

There are, however, some complications one has to take into account. 
First, it is sometimes necessary to duplicate an event node to obtain a de- 
composition to serial and parallel episodes. Duplication works easily with 
injective episodes, but non-injective episodes need more complex methods. 
Another important aspect is that composite events have a duration, unlike 
the elementary events in E. 

A practical alternative is to handle all episodes basically like parallel 
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episodes, and to check the correct partial ordering only when all events are 
in the window. Parallel episodes can be located efficiently; after they have 
been found, checking the correct partial ordering is relatively fast. 

4 An alternative approach: minimal occur- 
rences 

4-1 Outline of the approach 

In this section we describe an alternative approach to the discovery of epis- 
odes. Instead of looking at the windows and only considering whether an 
episode occurs in a window or not, we now look at the exact occurrences of 
episodes and the relationships between those occurrences. One of the ad- 
vantages of this new approach is that focusing on the occurrences of episodes 
allows us to more easily find rules with two window widths, one for the left- 
hand side and one for the whole rule, such as "if A and B occur within 15 
seconds, then C follows within 30 seconds". 

The approach is based on minimal occurrences of episodes. Besides the 
new rule formulation, the use of minimal occurrences gives raise to the follow- 
ing new method, called MlNEPl, for the recognition of episodes in the input 
sequence. For each frequent episode we store information about the locations 
of its minimal occurrences. In the recognition phase we can then compute 
the locations of minimal occurrences of a candidate episode a as a temporal 
join of the minimal occurrences of two subepisodes of a. In addition to being 
simple and efficient, this formulation has the advantage that the confidences 
and frequencies of rules with a large number of different window widths can 
be obtained quickly, i.e., there is no need to rerun the analysis if one only 
wants to modify the window widths. In the case of complicated episodes, 
the time needed for recognizing the occurrence of an episode can be signific- 
ant; the use of stored minimal occurrences of episodes eliminates unnecessary 
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repetition of the recognition effort. 

We identify minimal occurrences with their time intervals in the following 
way. Given an episode a and an event sequence s, we say that the interval 
[t s ,t e ) is a minimal occurrence of a in s, if (1) a occurs in the window 
w = (w, tf s , t e ) on s, and if (2) a does not occur in any proper subwindow on 
w, i.e., not in any window w' = (w 1 \ t f e ) on s such that t s < t' s , t' e < £ e , and 
width(w') < width(vs). The set of (intervals of) minimal occurrences of an 
episode a in a given event sequence is denoted by mo(a): mo(a) = { [t s ^t e ) \ 
[t S) t e ) is a minimal occurrence of a}. 

Example 7 Consider the event sequence s in Figure 2 and the episodes in 
Figure 3. The parallel episode j3 consisting of event types A and B has 
four minimal occurrences in s: mo(/3) = {[35, 38), [46,48), [47, 58), [57, 60)}. 
The partially ordered episode 7 has the following three minimal occurrences: 
[35, 39), [46, 51), [57, 62). □ 

An episode rule is an expression j3[wini] a [win 2 ], where j3 and a are 
episodes such that (3 X a, and mini and win 2 are integers. The informal 
interpretation of the rule is that if episode /? has a minimal occurrence at 
interval [t s ,t e ) with t e — t $ < tumi, then episode a occurs at interval [t s ,^) 
for some t' e such that t' e — t s < win 2 . Formally this can be expressed in the 
following way. Given win x and /?, denote 

mOvjim (0) = {[<., t B ) 6 mo{(3)\ le- 
ts < wini}. "Further, given a and an interval [w s ,w e ), define occ(a, [u S} u e )) = 
true if and only if there exists a minimal occurrence [w^Wg) € rao(a) such that 
u s < u' s and u l e < u e . The confidence of an episode rule /3[winx] a [win 2 ] 
is now 

l{[W e ) e mo wini (j3) I occ(a, [t s ,t s + win 2 ))}\ 

\mo wini (p)\ 

Example 8 Continuing the previous example, we have, e.g., the fol- 
lowing rules and confidences. For the rule (3 [3] =>* 7 [4] we have 
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|{[35,38), [46,48), [57, 60)}| in the denominator and |{[35,38)}| in the nu- 
merator, so the confidence is 1/3. For the rule j3 [3] 7 [5] the confidence 
is 1. □ 



Note that since 0 is a subepisode of a, the rule right-hand side a contains 
information about the relative location of each event in it, so the "new" events 
in the rule right-hand side can actually be required to be positioned, e.g., 
between events in the left-hand side. There is also a number of possible 
definitions for the temporal relationship between the intervals. For instance, 
rules that point backwards in time can be defined in a similar way. For 
brevity, we only consider this one case. 

We defined the frequency of an episode as the fraction of windows that 
contain the episode. While frequency has a nice interpretation as the prob- 
ability that a randomly chosen window contains the episode, the concept is 
not very useful with minimal occurrences: (1) there is no fixed window size, 
and (2) a window may contain several minimal occurrences of an episode. 
Instead of frequency, we use the concept of support, the number of minimal 
occurrences of an episode: the support of an episode a in a given event se- 
quence s is |mo(a)|. Similarity to the a frequency threshold, we now use a 
threshold for the support: given a support threshold min_sup, an episode a 
is frequent if |mo(a)| > min_sup. 

The current episode rule discovery task can be stated as follows. Given 
an event sequence s, a class £ of episodes, and a set W oi time bounds, find 
all frequent episode rules of the form j3[wini] a[win^ where /?, a G £, 
(3 -< a, and mini, win 2 G W. 

4.2 Finding minimal occurrences of episodes 

In this section we describe informally the collection MlNEPl of algorithms 
that locate the minimal occurrences of frequent serial and parallel episodes. 
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Let us staxt with some observations about the basic properties of episodes. 
Lemma 6 still holds: the subepisodes of a frequent episode are frequent. Thus 
we can use the main algorithm (Algorithm 2) and the candidate generation 
(Algorithm 3) also for MlNEPl. We have the following results about the 
minimal occurrences of an episode also containing minimal occurrences of its 
subepisodes. 

Lemma 9 Assume a is an episode and (5 ■< a is its subepisode. If [t s ^ t e ) £ 
mo(a), then /? occurs in [t a , t e ) and hence there is an interval [tz s , u e ) £ mo(/3) 
such that t 8 < u s < u e < t e . 

Lemma 10 Let a be a serial episode of size and let [t s , t e ) £ mo(a). Then 
there are subepisodes a\ and a 2 of a of size k—l such that [2.,, 2*) £ mo{a.\) 
for some t\ < t e and [t^t e ) £ mo(a 2 ) for some t\ > t s . 

Lemma 11 Let a be a parallel episode of size fc, and let [t s ,t e ) £ mo(a). 
Then there are subepisodes ac\ and a 2 of a of size k — 1 such that £ 
mo(a 1 ) and [ij,**) £ mo(a 2 ) for some ^,^,^,< e 2 £ [^ 5 ^e]j and furthermore 
U = min{^,^} and t e = max{**,^}. 

The minimal occurrences of a candidate episode a are located in the 
following way. In the first iteration of the main algorithm, mo(a) is computed 
from the input sequence for all episodes a of size 1. In the rest of the 
iterations, the minimal occurrences of a candidate a are located by first 
selecting two suitable subepisodes c*i and a 2 of a, and then computing a 
temporal join between the minimal occurrences of c*i and a 2l in the spirit of 
Lemmas 10 and 11. 

To be more specific, for serial episodes the two subepisodes are selected 
so that ax contains all events except the last one and a 2 in turn contains all 
except the first one. The minimal occurrences of a are then found with the 
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following specification: 

7710(a) = { [t s ,u e ) J there are [t s ,t e ) G mo(a 1 ) and [u s ,u e ) E mo(a 2 ) 

such that tf 5 < u s ,t e < w e , and [tf 5 ,w e ) is minimal}. 

For parallel episodes, the subepisodes ot\ and ol<i contain all events except 
one; the omitted events must be different. See Lemma 11 for the idea of how 
to compute the minimal occurrences of a. 

The minimal occurrences of a candidate episode a can be found in a linear 
pass over the minimal occurrences of the selected subepisodes ol\ and c*2. The 
time required for one candidate is thus 0(\ mo(ai )| + |mo(a 2 )| + |mo(a)|), 
which is <9(n), where n is the length of the event sequence. To optimize 
the running time, a\ and a 2 can be selected so that |rao(ai)| + 1 7710(0:2) | is 
minimized. 

The space requirement of the algorithm can be expressed as 
J2i Y^aeTi \ mo { a )V assuming the minimal occurrences of all frequent episodes 
are stored, or alternatively as max, (2^^-.^^ |mo(a)|), if only the current 
and next levels of minimal occurrences are stored. The size of Ylae^i | m °( a )| 
is bounded by ri, the number of events in the input sequence, as each event 
in the sequence is a minimal occurrence of an episode of size 1. In the second 
iteration, an event in the input sequence can start at most l^il minimal oc- 
currences of episodes of size 2. The space complexity of the second iteration 
is thus O^T^n). 

While minimal occurrences of episodes can be located quite efficiently, 
the size of the data structures can be even larger than the original database, 
especially in the first couple of iterations. A practical solution is to use in the 
beginning other pattern matching methods, e.g., similar to the ones given for 
Winepi in Section 3, to locate the minimal occurrences. 

Finally, note that MlNEPl can be used to solve the task of Winepi. 
Namely, a window contains an occurrence of an episode exactly when it 
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contains a minimal occurrence. The frequency of an episode a can thus be 
computed from mo(a). 

4.3 Finding confidences of rules 

We now show how the information about minimal occurrences of frequent 
episodes can be used to obtain confidences for various types of episode rules 
without looking at the data again. 

Recall that we defined an episode rule as an expression j3 [wirti] 
a [wirty]) where (3 and a are episodes such that /? -< a, and wirti and win 2 are 
integers. To find such rules, first note that for the rule to be frequent, the 
episode a has to be frequent. So rules of the above form can be enumerated 
by looking at all frequent episodes a, and then looking at all subepisodes j3 
of a. The evaluation of the confidence of the rule (3 [wini] a [ro 2 ] can 
be done in one pass through the structures mo((3) and mo(a), as follows. 
For each [t SJ t e ) G mo(j3) with t e — 1 8 < win\^ locate the minimal occurrence 
[u s ,u e ) of a such that t s < u s and [w 5 , u e ) is the first interval in mo(a) with 
this property. Then check whether u e — t s < win 2 . 

The time complexity of the confidence computation for a given episode 
and given time bounds win\ and win^ is 0(\mo(/3)\ + |mo(a)|). The con- 
fidences for all mini, win 2 in the set W of time bounds can be found, using 
a table of size |Wf , in time O(\mo{0)\ + \mo(a)\ + \ W\ 2 ). For reasons of 
brevity we omit the details. 

The set W^of time bounds can be used to restrict the initial search of min- 
imal occurrences of episodes. Given W, denote the maximum time bound by 
win max = max( W). In episode rules, only occurrences of at most win max time 
units can be used; longer episode occurrences can thus be ignored already in 
the search of frequent episodes. We consider the support, too, to be com- 
puted with respect to a given win max . 
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5 Experiments 



We have run a series of experiments using WlNEPi and MlNEPI. The general 
performance of the methods, the effect of the various parameters, and the 
scalability of the methods are considered in this section. Consideration is 
also given to the applicability of the methods to various types of data sets. 

The experiments have been run on a PC with 166 MHz Pentium processor 
and 32 MB main memory, under the Linux operating system. The sequences 
resided in a flat text file. 

5.1 Performance overview 

For an experimental overview we discovered episodes and rules in a telecom- 
munication network fault management database. The database is a sequence 
of 73679 alarms covering a time period of 7 weeks. There are 287 different 
types of alarms with very diverse frequencies and distributions. On the av- 
erage there is an alarm every minute. However, the alarms tend to occur in 
bursts: in the extreme cases there are over 40 alarms in one second. 

We start by looking at the performance of the WlNEPi method described 
in Section 3. There are several performance characteristics that can be used 
to evaluate the method. The time required by the method and the number 
of episodes and rules found by the method, with respect to the frequency 
threshold or the window width, are possible performance measures. We 
present results for the two opposite extreme cases of the complexity: serial 
episodes and injective parallel episodes. 

Tables 1 and 2 represent performance statistics for finding frequent epis- 
odes in the alarm database with various frequency thresholds. The number 
of frequent episodes decreases rapidly as the frequency threshold increases. 
With a given frequency threshold, the numbers of serial and injective parallel 
episodes may be fairly similar, e.g., a frequency threshold of 0.002 results in 
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Frequency 


Candidates 


Frequent 


Iterations 


Total 


threshold 




Episodes 




time (s) 


0.001 


4528 


359 


45 


680 


0.002 


2222 


151 


44 


646 


0.005 


800 


48 


10 


147 


0.010 


463 


22 


7 


110 


0.020 


338 


10 


4 


62 


0.050 


288 


1 


2 


22 


0.100 


287 


0 


1 • 


16 


Table 1: Performance characteristics for serial episodes with Winepi; alarm 


database, window width 60 s. 








Frequency 


Candidates 


Frequent 


Iterations 


Total 


threshold 




Episodes 




time (s) 


0.001 


2122 


185 


5 


49 


0.002 


1193 


93 


4 


48 


0.005 


520 


32 


4 


34 


0.010 


366 


17 


. 4 


34 


0.020 


308 


9 


3 


19 


0.050 


287 


1 


2 


15 


0.100 


287 


0 


1 


14 



Table 2: Performance characteristics for injective parallel episodes with 
Winepi; alarm database, window width 60 s. 



151 serial episodes or 93 parallel episodes. The actual episodes are, however, 
very different, as can be seen from the number of iterations: recall that each 
iteration / produces episodes of size /. For the frequency threshold of 0.002, 
the longest frequent serial episode consists of 43 events (all candidates of the 
last iteration were infrequent), while the longest frequent injective parallel 
episodes have 3 events. The number of iterations equals the number of can- 
didate generation phases. The number of database passes equals the number 
of iterations, or is smaller by one when there were no candidates in the last 
iteration. 
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Figure 5: Number of frequent serial (solid line) and injective parallel (dotted 
line) episodes as a function of the window width; WlNEPl, alarm database, 
frequency threshold 0.002. 

The effect of the window width on the number of frequent episodes is 
represented in Figure 5. For each window width, there are considerably fewer 
frequent injective parallel episodes than frequent serial episodes. With the 
alarm data, the increase in the number of episodes is fairly even throughout 
the window widths that we considered. However, we will later show that this 
may depend heavily on the type of data we are using. 

Figure 6 represents the number of serial and injective parallel episodes 
found by the method, and Figure 7 the total processing time required, as 
the frequency threshold increases. Both curves decrease steeply with the 
increasing frequency threshold. The time requirement is much smaller for 
parallel episodes than for serial episodes with the same threshold. There 
are two reasons for this. The parallel episodes are considerably shorter (see 
Tables 1 and 2) and hence, fewer database passes are needed. The complexity 
of recognizing injective parallel episodes is also smaller. 
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Figure 6: Number of frequent serial (solid line) and injective parallel (dotted 
line) episodes as a function of the frequency threshold with WlNEPl; alarm 
database, window width 60 s. 
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Figure 7: Processing time for serial (solid line) and injective parallel (dot- 
ted line) episodes as a function of the frequency threshold; WlNEPl, alarm 
database, window width 60 s. 



5.2 Quality of candidate generation 

We now take a closer look at the candidates considered and frequent episodes 
found during the iterations of the procedure. As an example, let us look at 
what happens during the first iterations. Statistics of the first ten iterations 
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Episode 


Episodes 


Candidates 


Frequent 


Match 


size 






episodes 




1 


287 


287 


58 


20 % 


2 


82369 


3364 


137 


4% 


3 


2- 10 7 


719 


46 


6% 


4 


7 - 10 9 


37 


24 


64 % 


5 


2 • 10 12 


24 


17 


71 % 


6 


6 • 10 14 


18 


12 


67 % 


7 


2 • 10 17 


13 


12 


92 % 


8 


5 • 10 19 


13 


8 


62 % 


9 


1 ■ 10 22 


8 


3 


38 % 


10 


4 • 10 24 


3 


2 


67 % 



Table 3: Number of candidate and frequent serial episodes during the first 
ten iteration phases with WlNEPi; alarm database, frequency threshold 0.001, 
window width 60 s. 

of a run with a frequency threshold of 0.001 and a window width of 60 s is 
shown in Table 3. 

The three first iterations dominate the behavior of the method. During 
these phases, the number of candidates is large, and only a small fraction 
(less than 20 per cent) of the candidates turns out to be frequent. After the 
third phase the candidate generation is efficient, few of the candidates are 
found infrequent, and although the total number of iteration phases is 45, 
the last 35 iterations involve only 1-3 candidates each. Thus we could safely 
combine several of the later iteration steps, to reduce the number of database 
passes. 

If we take a closer look at the frequent episodes, we observe that all 
frequent episodes longer than 7 events consist of repeating occurrences of 
two very frequent alarms. Each of these two alarms occurs in the database 
more than 12000 times (16 per cent of the events each). 
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Support 


Candidates 


Frequent 


Iterations 


Total 


threshold 




Episodes 




time (s) 


50 


12732 


2735 


83 


28 


100 


5893 


826 


71 


16 


250 


2140 


298 


54 


16 


500 


813 


138 


49 


14 


1000 


589 


92 


48 


14 


2000 


405 


64 


47 


13 


4000 


352 


53 


46 


12 


Table 4: Performance characteristics for serial episodes with MiNEPl; alarm 


database, 


maximum time bound 60 s. 






Support 


Candidates 


Frequent 


Iterations 


Total 


threshold 




Episodes 




time (s) 


50 


10041 


4856 


89 


30 


100 


4376 


1755 


71 


20 


250 


1599 


484 


54 


14 


500 


633 


138 


49 


13 


1000 


480 


89 


48 


12 


2000 


378 


66 


47 


12 


4000 


346 


53 


46 


12 



Table 5: Performance characteristics for parallel episodes with MiNEPl; alarm 
database, maximum time bound 60 s. 



5.3 Comparison of algorithms Winepi and Minepi 

Tables 4 and 5 represent performance statistics for finding frequent episodes 
with MiNEPl, the method using minimal occurences. Compared to the cor- 
responding figures for WINEPI in Tables 1 and 2, we observe the same general 
tendency for a rapidly decreasing number of candidates and episodes, as the 
support threshold increases. 

The episodes found by Winepi and* Minepi are not necessarily the same. 
If we compare the cases in Tables 1 and 4 with approximately the same 
number of frequent episodes, e.g., 151 serial episodes for WlNEPl and 138 for 
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Figure 8: Processing time for serial (solid line) and injective parallel (dotted 
line) episodes with MiNEPi; alarm database, maximum time bound 60 s. 

Minepi, we notice that they do not correspond to the same episodes. The 
sizes of the longest freuquent episodes are somewhat different (43 for the 
original, 48 for the minimal occurrence method). The frequency threshold 
0.002 for WlNEPi corresponds, at the minimum, to about 150 instances of the 
episode, while the support threshold used for MiNEPi is 500. The difference 
between the methods is very clear for small episodes. Consider an episode a 
consisting of just one event A. Winepi considers a single event A to occur in 

60 windows of width 60 s, while Minepi sees only one minimal occurrence. 
On the other hand, two successive events of type A result in a occuring in 

61 windows, but the number of minimal occurrences is doubled from 1 to 2. 
Figure 8 shows the time requirement for finding frequent episodes with 

Minepi. The processing time for MiNEPi reaches a plateau when the size of 
the maximal episodes no longer changes (in this case, at support threshold 
500). The behavior is similar for serial and parallel episodes. The time 
requirements of Minepi should not be directly compared to Winepi: the 
episodes discovered are different, and our implementation of Minepi works 
entirely in the main memory. With very large databases this might not be 
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Varying support threshold, 
four time bounds 


Varying number of time bounds, 
support threshold 1000 


Support 


Distinct 


Rule gen. 


Number of 


All 


Rule gen. 


threshold 


rules 


time (s) 


time bounds 


rules 


time (s) 


50 


50470 


149 


1 


1221 


13 


100 


10809 


29 


2 


2488 


13 


250 


4041 


20 


4 


5250 


15 


500 


1697 


16 


10 


11808 


18 


1000 


1221 


15 


20 


28136 


22 


2000 


1082 


14 


30 


42228 


27 


4000 


1005 


14 


60 


79055 


43 



Table 6: Number of rules and rule generation time with MiNEPl; alarm data- 
base, serial episodes, support threshold 1000, maximum time bound 60 s, 
confidence threshold 0. 

possible during the first iterations; either the minimal occurrences need to 
be stored on the disk, or other methods (e.g., variants of Algorithms 4 and 
5) must be used. 

5.4 Rules 

The methods can easily produce very large amounts of rules. Recall that rules 
are constructed by considering all frequent episodes a as the right-hand side 
and all subepisodes /? X a as the left-hand side of the rule. Additionally, 
MiNEPl considers variations of these rules with all the time bounds in the 
given set W. 

Table 6 represents results with serial episodes. The initial episode genera- 
tion with Minepi took around 14 s, and the total number of frequent episodes 
was 92. The table shows the number of rules obtained by MiNEPl with con- 
fidence threshold 0 and with maximum time bound 60 s. On the left, with 
a varying support threshold, rules that differ only in their time bounds are 
excluded from the figures; the rule generation time is, however, obtained by 
generating rules with four different time bounds. 
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Figure 9: Total number of distinct rules found by MlNEPl with various con- 
fidence thresholds; alarm database, maximum time bound 60 s, support 
threshold 100. 

The minimal occurrence method is particularly useful, if we are interested 
in finding rules with several different time bounds. The right side of Table 6 
represents performance results with a varying number of time bounds. The 
time requirement increases slowly as more time bounds are used, and the 
time increases slowlier than the number of rules. 

The amount of almost 80000 rules, obtained with 60 time bounds, may 
seem unnecessarily large and unjustified. Remember, however, that there are 
only 1221 distinct rules. The rest of the rules present different combinations 
of time bounds, in this case down to the granularity of one second. For 
the cost of 43 s we thus obtain very fine-grained rules from our frequent 
episodes. Different criteria can then be used to select the most interesting 
rules from these. Figure 9 represents the effect of the confidence threshold to 
the number of distinct rules found by MlNEPl. Although the initial number of 
rules may be quite large, it decreases fairly rapidly if we require a reasonable 
confidence. 
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Data set 


Events 


Event 


Supp. 


Max 


Conf. 


Freq. 


Rules 


name 




types 


thr. 


time b. 


thr. 


epis. 




alarms 


73679 


287 


100 


60 


0.8 


826 


6303 


WWW 


116308 


7634 


250 


120 


0.2 


454 


316 


textl 


5417 


1102 


20 


20 


0.2 


127 


19 


text2 


2871 


905 


20 


20 


0.2 


34 


4 


protein 


4941 


22 


7 


10 




21234 





Table 7: Characteristic parameter values for each of the data sets and the 
number of episodes and rules found by Minepi. 



5.5 Results with different data sets 

In addition to the experiments on the alarm database, we have run MlNEPl 
on a variety of different data collections to get a better view of the usefulness 
of the method. The data collections that were used, some typical parameter 
values for them, and some results are presented in Table 7. 

The WWW data is part of the WWW server log from the Department of 
Computer Science at the University of Helsinki. The log contains requests 
to WWW pages at the department's server; such requests can be made by 
WWW browsers at any host in the Internet. We consider the WWW page 
fetched as the event type. The number of events in our data set is 116308, 
covering three weeks in February and March, 1996. In total, 7634 different 
pages are referred to. Requests for images have been excluded from consid- 
eration. 

Suitable support thresholds vary a lot, depending on the number of events 
and the distribution of event types. A suitable maximum time bound for the 
device generated alarm data is one minute, while the slower pace of a human 
user requires using a larger time bound (two minutes or more) for the WWW 
log. By using a relatively small time bound we reduce the probability of 
unrelated requests contributing to the support. A low confidence threshold 
for the WWW log is justified since we are interested in all fairly usual patterns 
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of usage, not only in the dominating ones. In the WWW server log we found, 
e.g., long often used paths of pages from the home page of the department 
to the pages of individual courses. Such behavior suggests that rather than 
using a bookmark directly to the home page of a course, many users quickly 
navigate there from the departmental home page. 

The two text data collections are modifications of the same English text. 
Each word is considered an event, and the words are indexed consecutively 
to give a "time" for each event. The end of each sentence causes a gap 
in the indexing scheme, to correspond to a longer distance between words 
in different sentences. We used text from GNU man pages (the gnu awk 
manual). The size of the original text (textl) is 5417 words, and the size of 
the condensed text file (text2), where noninformative words such as articles, 
prepositions, and conjunctions, have been stripped off, is 2871 words. The 
number of different words in the original and the condensed text is 1102, 
resp. 905. 

For text analysis, there is no point in using large time bounds, since it 
is unlikely that there is any connection between words that axe not fairly 
close to each other. This can be clearly seen in Figure 10 which represents 
the number of episodes found on various window widths using WlNEPl. This 
figure reveals behavior that is distinctively different from the corresponding 
Figure 5 for the alarm database. We observe that for the text data, the 
window widths from 24 to 50 produce practically the same amount of serial 
episodes. The number of episodes will only increase with considerably larger 
window widths. For this data, the interesting frequent episodes are smaller 
than 24,. while the episodes found with much larger window widths are noise. 
The same phenomenon can be observed for parallel episodes. 

Only few rules can be found in text using a simple analysis like this. 
The strongest rules in the original text involve either the word "gawk" , or 
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Figure 10: Number of serial (solid line) and injective parallel (dotted line) 
episodes as a function of the window width; WiNEPl, compressed text data 
(text2), frequency threshold 0.02. 

common phrases such as 

the, value [2] => of [3] (confidence 0.90) 

meaning that in 90 % of the cases where the words "the value" axe consec- 
utive, they axe immediately followed by the preposition "of". These rules 
were not found in the condensed text since all prepositions and articles have 
been stripped off. The few rules in the condensed text contain multiple oc- 
currences of the word "gawk", or combinations of words occurring in the 
header of each man page, such as "free software". 

We performed scale-up tests with 5, 10, and 20 fold multiples of the 
compressed text file, i.e., sequences of approximately 2900 to 58000 events. 
The results in Figure 11 show that the time requirement is roughly linear 
with respect to the length of the input sequence, as could be expected. 

Finally, we experimented with protein sequences. We used data in the 
PROSITE database [1] of the ExPASy WWW molecular biology server of the 
Geneva University Hospital and the University of Geneva [11]. PROSITE 
contains biologically significant DNA and protein patterns that help to 
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Figure 11: Scale-up results for serial (solid line) and injective parallel (dotted 
line) episodes with MlNEPi; compressed text data, maximum time bound 60, 
support threshold 10 for the smallest file (n-fold for the larger files). 

identify to which family of protein (if any) a new sequence belongs. The 
purpose of our experiment is to evaluate our algorithm against an external 
data collection and patterns that are known to exist, not to find patterns 
previously unknown to the biologists. We selected as our target a family of 
7 sequences ("DNA mismatch repair proteins 1", PROSITE entry PS00058). 
The sequences in the family are known to contain the string GFRGEAL of seven 
consequtive symbols. We transformed the data in a manner similar to the 
English text: symbols are indexed consecutively, and between the protein 
sequences we place a gap. The total length of this data set is 4941 events, 
with an alphabet of 22 event types. The method could be easily modified to 
take several separage sequences as input, and to compute the support of an 
episode a, e.g., as the number of input sequences that contain a (minimal) 
occurrence of a of length at most the maximum time bound. 

The parameter values for the protein database are chosen on purpose to 
reveal the pattern that is known to be present in the database. The window 
width was selected to be 10, i.e., slightly larger than the length of the pattern 
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that we were looking for, and the support threshold was set to 7, for the 
seven individual sequences in the original data. With this data, we are only 
interested in the longest episodes (of length 7 or longer). Of the more than 
20000 episodes found, 17 episodes are of length 7 or 8. As expected, these 
contain the sequence GFRGEAL that was known to be in the database. The 
longer episodes are variants of this pattern with an eighth symbol fairly near, 
but not necessarily immediately subsequent to the pattern (e.g., GFRGEAL*S). 
These types of patterns belong to the pattern class used in PROSITE but, to 
our suprise, these longer patterns are not reported in the PROSITE database. 

6 Extensions and related work 

The task of discovering frequent parallel episodes can be stated as a task 
of discovering all frequent sets, a central phase of discovering assocation 
rules [2], the rule generation methods are also basically the same for asso- 
ciation rules and WlNEPl. The levelwise main algorithm has also been used 
successfully in the search of frequent sets [3]. 

Technical problems related to the recognition of episodes have been re- 
searched in several fields. Taking advantage of the slowly changing contents 
of the group of recent events has been studied, e.g., in artificial intelligence, 
where a similar problem in spirit is the many pattern/many object pat- 
tern match problem in production system interpreters [9]. Also, comparable 
strategies using a sliding window have been used, e.g., to study the locality 
of reference in virtual memory [7]. Our setting differs from these in that 
our window is a queue with the special property that we know in advance 
when an event will leave the window; this knowledge is used by WlNEPl in 
the recognition of serial episodes. In MiNEPl, we take advantage of the fact 
that we know where subepisodes of candidates have occurred. 

The recent work on sequence data in databases (see [21]) provides inter- 
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esting openings towards the use of database techniques in the processing of 
queries on sequences. A problem similar to the computation of frequencies 
occurs also in the area of active databases. There triggers can be specified 
as composite events, somewhat similar to episodes. In [10] it is shown how 
finite automata can be constructed from composite events to recognize when 
a trigger should be fired. This method is not practical for episodes since the 
deterministic automata could be very large. 

The methods for matching sets of episodes against a sequence have some 
similarities to the algorithms used in string matching (e.g., [12]). In par- 
ticular, recognizing serial episodes in a sequence can be seen as locating all 
occurrences of subsequences, or matches of patterns with variable length 
don't care symbols, where the length of the occurrences is limited by the 
window width. Learning from a set of sequences has received considerable 
interest in the field of bioinformatics, where an interesting problem is the 
discovery of patterns common to a set of related protein or amino acid se- 
quences. The classes of patterns differ from ours; they can be, e.g., substrings 
with fixed length don't care symbols [15]. Closer to our patterns are those 
considered in [24]. The described algorithm finds patterns that are similar 
to serial episodes; however, the patterns have a given minimum length, and 
the occurrences can be within a given edit distance. Recent results on the 
pattern matching aspects of recognizing episodes can be found in [6]. 

The work most closely related to ours is perhaps [4]. There multiple 
sequences are searched for patterns that are similar to the serial episodes 
with some extra restrictions and an event taxonomy. Our methods can be 
extended with a taxonomy by a direct application of the similar extensions to 
association rules [13, 14, 22]. Also, our methods can be applied on analyzing 
several sequencies; there is actually a variety of choices for the definition of 
frequency of an episode in a set of sequencies. More recently, the pattern 
class of [4] has been extended with windowing, some extra time constraints, 
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and an event taxonomy [23]. — For a survey on patterns in sequential data, 
see [17]. 

In stochastics, event sequence data is often called a marked point pro- 
cess [16]. It should be noted that traditional methods for analyzing marked 
point processes are ill suited for the cases where the number of event types 
is large. However, there exists an interesting combination of techniques: fre- 
quent episodes are discovered first, and then the phenomena they describe 
are analyzed in more detail with methods for marked point processes. 

There are also some interesting similarities between the discovery of fre- 
quent episodes and the work done on inductive logic programming (see, 
e.g., [20]); a noticeable difference is caused by the sequentiality of the under- 
lying data model, and the emphasis on time-limited occurrences. Similarly, 
the problem of looking for one occurrence of an episode can be viewed as a 
constraint satisfaction problem. 

The class of patterns discovered can be easily modified in several direc- 
tions. Different windowing strategies could be used, e.g., considering only 
windows starting every win* time units for some win! \ or windows starting 
from every event. Other types of patterns could also be searched for, e.g., 
substrings with fixed length don't care symbols; searching for episodes in 
several sequences is no problem. A more general framework for episode dis- 
covery has been presented in [18]. There episodes are defined as combinations 
of events satisfying certain user specified unary of binary conditions'. 

7 Conclusions 

We presented a framework for discovering frequent episodes in sequential 
data. The framework consists of defining episodes as partially ordered sets 
of events, and looking at windows on the sequence. We described an al- 
gorithm, WlNEPi, for finding all episodes from a given class of episodes that 
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axe frequent enough. The algorithm was based on the discovery of episodes 
by only considering an episode when all its subepisodes are frequent, and 
on incremental checking of whether an episode occurs in a window. The 
implementation shows that the method is efficient . We have applied the 
method in the analysis of the alarm flow from telecommunication networks, 
and discovered episodes have been embedded in alarm handling software. 

We also presented an alternative approach, MlNEPl, to the discovery of 
frequent episodes, based on minimal occurrences of episodes. This approach 
supplies more power for representing connections between events, as it pro- 
duces rules with two time bounds. 

Both rule formalisms have their advantages. While the rules of Minepi 
axe often more informative, the frequencies and confidences of the rules of 
Winepi have nice interpretations as probabilities concerning randomly chosen 
windows. For a large paxt the algorithms are similar, there axe significant 
differences only in the computation of the frequency or support. Roughly, 
a general tendency in the performance is that WlNEPl can be more efficient 
in the first phases of the discovery, mostly due to smaller space requirement. 
In the later iterations, Minepi is likely to outperform WlNEPl clearly. The 
methods can be modified for cross-use, i.e., WlNEPl for finding minimal oc- 
currences and MlNEPl for counting windows, and for some large problems — 
whether the rule type of WlNEPl or MlNEPl — a mixture of the two methods 
could give better performance than either alone. 

Interesting extensions to the work presented here are facilities for rule 
querying and compilation, i.e., methods by which the user could specify the 
episode class in high-level language and the definition would automatically 
be compiled into a specialization of the algorithm that would take advantage 
of the restrictions on the episode class. Other open problems include the 
combination of episode techniques with marked point processes and intensity 
models. 
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